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Re Item III 

No basis was found for "on adding the image to the reference image, setting the higher 
resolution data to zero" in the encoder (claim 1 and claim 25). This feature is even in 
contradiction with the description. Claims 1-25 are therefore not supported by the 
description as required by Article 6 PCT. Furthermore this amendment introduces 
subject-matter which extends beyond the content of the application as filed, contrary to 
Article 34(2)(b) PCT. 

On the paragraph bridging pages 14 and 15 of the description,- it is explained that the 
decoder will set to zero the higher resolution data that are not coded for a current 
image and which were coded in the reference image. This is needed to ensure the 
visual quality of the decoded picture in case of shortage of bandwidth because the 
reference image used by the decoder is not the same as the reference image used by 
the encoder as a skilled person would understand by reading the description page 8 to 
14 (the decoder will have a lower resolution reference image). 
No clear indication could be found in the description that the encoder includes a 
mechanism to set the higher resolution data of the reference image in the coding loop 
to zero. 

The applicant tries to define a coding method with actions that are performed in a 
decoder. 
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Error concealment strategies used by video coding methods based upon the H.263 
algorithm are spatial and therefore show up as corrupted regions in reconstructed 

image frames . 

It is an object of the present invention to provide an image coding method which 
overcomes at least one of the above disadvantages. 

According to the invention there is provided an image coding method comprising 
generating an ordered sequence of coded image data, the sequence beginning with 
coded data representative of an area of the image having high importance, and ending 
with coded data representative of an area of the image having lower importance. 

Preferably, the importance of the image areas represented by the coded data decreases 
gradually over the ordered sequence. 

Preferably, the image data coding sequence is arranged in a substantially spiral 
configuration centred on the area of importance. 

Preferably, the area of importance is at a location selected as the most likely centre 
point of foveated vision of a viewer of the image. 

Preferably, the area of importance is at a centre point of the image. 

Preferably, the method includes converting an image into a multi-resolution 
representation, different resolution representations of the image being coded in 
sequence, the order of the sequence being determined to reflect psychophysical 
aspects of human vision. 

Preferably, according to the sequence a luminance representation of the image is 
coded before chrominance representations of the image. 

Preferably, for a given level of resolution, the luminance representation is arranged to 
include more resolution than the chrominance representations. 
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Claims 

L An image coding method comprising generating an ordered sequence of coded 
image data, the sequence beginning with coded data representative of an area of the 
image having high importance, and ending with coded data representative of an area 
of the image having lower importance. 

2. An image coding method according to claim 1, wherein the importance of the 
image areas represented by the coded data decreases graduaHy over the ordered 
sequence. « 

3. An image coding method according to claim 2, wherein the image data coding 
sequence is arranged in a substantially spiral configuration centred on the area of 
importance. 

4. An image coding method according to any preceding claim, wherein the area 
of importance is at a location selected as the most likely centre point of foveated 
vision of a viewer of the image. 

5. An image coding method according to claim 4, wherein the area of importance 
is at a centre point of the image. 

6. An image coding method according to any preceding claim, wherein the 
method includes converting an image into a multi-resolution representation, different 
resolution representations of the image being coded in sequence, the order of the 
sequence being determined to reflect psychophysical aspects of human vision. 

7. An image coding method according to claim 6, wherein according to the 

sequence ? lnmin?.nce representation of the iro^e is coded before chrominance 
representations of the image. 

8. An image coding method according to claim 7, wherein for a given level of 
resolution, the luminance representation is arranged to include more resolution than 
the chrominance representations. 
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9. An image coding method according to any of claims 6 to 8, wherein the multi- 
resolution representation is generated using a wavelet transform, and the coding 

sequence comprises wavelet representation of the linage which increase iron} a low 
level of resolution to a high level of resolution. 

10. An image coding method according to claim 9, wherein wavelet orientations 
of horizontal and vertical image components are coded before wavelet orientations of 
diagonal image components. 

11. An image coding method according to claim 10, wherein wavelet orientations 
of diagonal image components of a given level of resolution are coded after wavelet 
orientations of horizontal and vertical image components of a higher resolution. 

12. An image coding method according to any preceding claim, wherein the 
method is implemented as part of a communications system, and the amount of coded 
information output by the method for a given image is determined on an image by 
image basis in accordance with the available bandwidth of the communications 
system. 

13. An image coding method according to claim 12, wherein where necessary in 
order to fully utilise the available bandwidth of the communications system includes a 
truncated ^sequence of coded image data, image data representative of areas of least 
importance having been excluded from the truncated sequence. 

14. An image coding method according to any preceding claim, wherein a 
predetermined code is added to a sequence to indicate the end of image data 
representative of a particular aspect of the image. 

15. An image coding method according to any preceding claim, wherein the image 
is one of a sequence of images, the image is compared to a reference image 
determined using preceding images of the sequence, and the coding method is used to 
code differences between the image and the reference image. 
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16. An image coding method according to any preceding claim, wherein scalar 
quantisation is used to minimise the amount of image data to be coded, the scalar 
quantisation being based upon a psychophysical model. 

17. An image coding method according to any preceding claim, wherein the 
method includes an estimation of morion within an image as compared with a 
reference image, and the estimated motion is included in the coded image data. 

18. An image coding method according to claim 17, wherein the method includes 
a choice between image data that has been coded using motion estimation and data 
that has been coded without using motion estimation, the choice being made upon the 
basis of minimising distortion of the coded image. 

19. An image coding method according to any preceding claim, wherein the 
method includes vector quantisation of the image, the vector quantisation being 
implemented using a self organising neural map to provide image data in the form of 
indices of a codebook. 

20. An image coding method according to claim 9 and claim 19, wherein a 
threshold is applied to the magnitude of wavelet coefficients, and those which fall 
below the threshold are converted to zero coefficients. 

21. An image coding method according to claim 9 and claim 19, wherein different 
codebooks are used for different sub-bands of the wavelet representation of the image. 

22. An image coding method according to any of claims 19 to 21, wherein the 
indices of the codebook are subsequently coded using variable length entropy coding. 

23. An image coding method according to claim 22, wherein a series of zero 
indices followed by a non-zero index is coded as a pair of values by the variable 
length entropy coding, a first value representing the number of zero indices in the 
series and the second value representing the value of the non-zero index. 
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24. An image coding method according to claim 22 or claim 23, wherein a 
threshold is applied to the indices of the codebook, and those indices which fall below 

the threshold are converted to zero indices. 

25. An image coding method according to claim 24, wherein wavelet coefficients 
which fall above the threshold are reduced by the value of the threshold. 

26. A method of decoding an image coded in accordance with any preceding 
claim, wherein where a truncated sequence of coded image data is received, the 
decoder decodes the image using the truncated sequence of coded image data and uses 
zero values in place of missing coded image data. 

27. A method of decoding an image according to claim 26, wherein the coded 
image is a difference image which is added to a reference image to generate a 
decoded image, and artefacts at higher resolutions of the decoded image caused by the 
truncated sequence are removed by setting the higher resolution data to zero. 

28. An image coding method substantially as hereinbefore described with 
reference to the accompanying figures. 
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Image Cnrffn. 



TTie present invention relates to a method of image coding. The method may be 
applied to, but is not limited to, coding of a series of video frames (where the tenn 
video refers to temporally separated images). 

A primary problem addressed by video coding is how to reduce the number of bits 
needed to represent video frames. This problem arises in communications 
applications as a result of bandwidth restriction of communications systems, and 
arises in storage applications from limited storage capacity of storage media. «A 
quarter common interchange format (QdF - 176 x 144 pixels) colour video stream 
comprises 25 frames per second (25fps), each pixel of the frame being represented by 
8 bits (8bpp). Transmission of a QCflF colour video stream would require a 
communications bandwidth of *14.5M bits/s. A 1 hour recording of a QCff colour 
video stream would require =6.4G Bytes of storage. There are many applications in 
which the available bandwidth or available storage capacity is oniers of magnitude 
less than these values. 



The development of a video coding method capable of coding video frames using 
small amounts of data is of considerable commercial importance. In particular, there 
is a demand for a video coding method capable of coding video frames for 
transmission by a low-bandwidth communication channel. Low bandwidth is defined 
here as a bandwidth of 64kbits/s or less (this corresponds to an ISDN basic rate 
channel). Other low bandwidths for which video coding methods are required include 
28.8k bits/s, which is typically utilised by computer modems, and 9600k bits/s or less 
which is used for mobile telephony applications, a video coding method must 
compress video frames to an extreme degree in order to transmit the video frames 
over a low bandwidth communications channel For example, for a 64k bits/s channel 
the QCIF colour video stream requires an average compression ratio of 232:1, a 28.8k 
bits/s channel requires 528:1 and a 10k bits/s channel, 1485:1. 

VjV - ■■ ■■- -v - • ',. .. .... 

an unchanging background. Redundancy of mis type is known as objective 
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information redundancy. Existing video coding methods take advantage of objective 
information redundancy to reduce the number of bits needed to represent video 
r " *• v-Vjj .;.lv;-I :ov/:,t v!\:> *..-';> j . 

representation in which a minimal amount of information may be used to exactly 
reconstruct the video frames, with any objective information redundancy being 
removed from the representation. 

Where a low-bandwidth communication channel is to be used (i.e. 64kbits/s or less), 
or a low capacity storage medium is to be used, information other than objectively 
redundant information is removed from representations of video frames. This will 
inevitably lead 10 degradation of the reconsmicred image frames; there is an 
unavoidable trade-off between distortion and bit rare. The problem addressed by low 
bit-rate video coding methods is how to minimis e distortion, and where there is 
distortion make that distortion as acceptable as possible. 

Popular video compression methods are derivatives of one another including MPEG- 
1, MPEG-2, IU61, BL263 (H.261: 'ITU-T Recommendation R26I, video codec for 
audiovisual services at p x 64 kbitfs\ Geneva, 1990; H.263: TTU-T Recommendation 
H263, video coding for low bit rate communication', February 1998; MPEG-1: 
'Coding of moving pictures and associated audio for digital storage media at up to 
about 1.5 Mbit/s\ ISO/DEC 11172-2 Video, November 1991; MPEG-2: •Generic 
coding of moving pictures and associated audio information', ISO/IEC 13818-2 
Video, Draft International Standard, November 1994]. These methods operate in the 
spatial domain, and include motion compensation and a first-order coding loop (Le. 
differtnees between successive images are coded). Discrete cosine transforms (DCT) 
of difference images are uniformly quantised and then entropy coded. The resulting 
bit stream is of variable length. Where a communications channel operates at a 
constant bit rate (as is conventional), a buffer is used to decouple the bit stream fiom 
the communications channel The average bit rate from the algorithm is maintai ned at 
the channel bit rate by regulating the scalar quantisation of the difference imag e DCT 
coefficients, coarser quantisation providing higher distortion bur lower bit rates. The 
number of bits generated cannot be accurately predicted, and this is why a buffer is 
needed to maintain an average bit rate. Where a drastic correction of the bit rate is 
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required, the frame rate is instantaneously decreased. The channel buffer introduces 
delay which becomes significant at very low bit-rates on narrow bandwidth channels. 



a 



Existing very low bit rate (VLBR) video coding methods are currently formulated 
around extensions of the H.263 algorithm thai also forms the basis for the MPEG-4 
type algorithms. [H.263: 'ITU-T Recommendation H.263, video coding for low bit 
rate communication'. February 1998; MPEG-4: 'MPEG-4 video verification model 
version-1 1\ ISO/EC JTC1/SC29/WG11, N2171, Tokyo, March 1998]. The H263 
coding algorithm is designed to operate over fixed copper land lines with a 28.8kbits/s 
modem. These channels are assumed to be error tree and are assumed to have a 
constant bandwidth. 

With the growth of the Internet and mobile telephone markets there is a demand to ■ 
deliver 'live' video content over channels that have significantly different channel 
characteristics compared to fixed copper land lines. The Internet has a very wide 
effective bandwidth dynamic range from a few hundred bytes/s to several kilo bytes/s. 
This bandwidth may change instantaneously. Third generation (3G) mobile 
telephones operate with a broadband Code Division Multiplex Access (CDMA) 
system which has a bandwidth that may change instantaneously. Typically, changes 
of effective bandwidth are caused by increased congestion or increased bit error rates. 

Existing video coding methods developed for the Internet and mobile telephone 
systems comprise adaptations of the H.263 algorithm. These methods suffer from the 
disadvantage that they require a channel buffer, and therefore include a delay which 
aflfects 'live' transmission of video and in particular live two-way communication. 
The buffer also aflfects the ability of the methods to respond to instantaneously 
changing channel bandwidth*. A further disadvantage of video coding methods based 
upon the H263 algorithm is that they cannot deliver information at a continuously 
variable bit rate, but instead jump between different bit rates in coarse steps, typically 
by varying the number of frames per second. Where there are 'many differences 
between images, for example if a fist moving object crosses the image, then the frame 

image is lost 
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Eiror concealment strategies used by video coding methods based upon the H263 
algorithm are spatial and therefore show up as corrupted regions in reconstructed 

It is an objeci of the present invention to provide an image coding method which 
overcomes at least one of the above disadvantages. 

According to the invention there is provided an image coding method comprising 
generating an ordered sequence of coded image data, the sequence beginning with 
coded daia representative of an area of the image having high importance, and ending 
with coded data representative of an area of die image having lower importance. 

Preferably, the importance of the image areas represented by the coded data decreases 
gradually over the ordered sequence. 

Preferably, the image data coding sequence is arranged in a substantially spiral 
configuration centred on the area of importance. 

Preferably, the area of importance is at a location selected as the most likely centre 
point of foveated vision of a viewer of the image. 

Preferably, the area of importance is at a centre point of the image. 

Preferably, the method includes converting an image into a multi-resolutioD 
representation, different resolution representations of the image being coded in 
sequence, the order of the sequence being determined to reflect psychophysical 
aspects of human vision. 

Preferably, according to the sequence a luminance representation of the image is 
coded before chrominance representations of the image. 

Preferably, for a given level of resolution, the luminance representation is arranged to 
include more resolution than the chrominance representations. 
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Preferably, the multi-resolution representation is generated using a wavelet transform, 
and the coding sequence comprises wavelet representation of the image which 
increase from a low level of resolution to a high level of resolution. 

Preferably, wavelet orientations of horizontal and vertical image components ate 
coded before wavelet orientations of diagonal image components. 

Preferably, wavelet orientations of diagonal image components of a given level of 
resolution are coded after wavelet orientations of horizontal and vertical image 
components of a higher resolution- 

Preferably, the method is implemented as part of a communications system, and the 
amount of coded information output by the method for a given image is detennmed 
on an image by image basis in accordance with the available bandwidth of the 
communications system. 

Preferably, where necessary in order to folly utilise the available bandwidth of the 
communications system includes a truncated sequence of coded imag e data, image 
data representative of areas of least importance having been excluded from the 
truncated sequence. 

Preferably, a predetermined code is added to a sequence to indicate tbe end of image 
data representative of a particular aspect of the image 

Preferably, the image is one of a sequence of images, the image is compared to a 
reference image determined using preceding images of the sequence, and the coding 
method is used to code differences between the image and the reference image. 

Preferably, scalar quantisation is used to minimise the amount of image data to be 
coded, the scalar quantisation being based upon a psychophysical modeL 

with a reference image, and the estimated motion is included in the coded image data. 
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Preferably, the method includes a choice between image data thai has been coded 
using motion estimation and data that has been coded without using motion 

coded image. 

Preferably, the method includes vector quantisation of the image, the vector 
quantisation being implemented using a self organising neural map to provide image 
data in die form of indices of a codebook. 

Preferably, a threshold is applied to the magnitude of wavelet coefficients, and those 
which fall below the threshold are converted to zero coefficients. 

Preferably, different codebooks are used for different sub-bands of the wavelet 
representation of the image. 

Preferably, the indices of the codebook are subsequently coded using variable length 
entropy coding. 

Preferably, a series of zero indices followed by a non-zero index is coded as a pair of 
values by the variable length entropy coding, a first value representing the number of 
Zero indices in the series and the second value representing the value of the non-zero 
index. 

Preferably, a threshold is applied to the indices of the codebook, and those indices 
which fell below the threshold are convened to zero indices. 

Preferably, wavelet coefficients which fall above the threshold are reduced by the 
value of the threshold. 

Preferably, the invention further comprises a method of decoding an imag e coded as 
described above, wherein where a truncated sequence of coded image data is received, 
the decoder decodes the image using the truncated sequence of coded image data and 
uses zero values in place of missing coded image d*fa 
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Preferably, the coded image is a difference image which is added to a reference image 
to generate a decoded image, and artefects ax higher resolutions 0 f the decoded image 
caused by the truncated sequence are removed by setting the higher resolution data to 
zero. 



A specific embodiment of the invention will now be described by way of example 

only, with reference to the accompanying drawings in which: 

Figure 1 is a block diagram representing a basic embodiment of the method; 

Figure 2 is a schematic illustration showing the order in which wavelet coefficients 

are coded by the method; 

Figure 3 is a schematic illustration showing the order in which an image coded using 
the method can subsequently be decoded; 

Figure 4 is a block diagram representing an extended embodiment of the method, 
which includes motion compensation; 

Figure 5 is a block diagram representing choice between die basic embodiment of the 
method and the extended embodiment of die method; 

Figure 6 is a schematic diagram representing a two dimensional inverse transform, 
used to predict quantisation errors; 

Figure 7 is a schematic representation of general vector quantisation encoding; 
Figure 8 is a schematic representation of general vector quantisation encoding 
including noise; 

Figure 9 is a graph representing distortion as a function of average bit rare, for 
different sizes of self organising neural maps as used by the embodiments of the 
invention; 

Figure 10 is a first fiame of a first sequence of video test i ma ges; 

Figure 11 is a first fiame of a second sequence of video test images; 

Figure 12 is a graph showing the performance of the embodiments of the invention at 

lOkbits/sccond; 

Figure 13 is a graph showing the performance of the embodiments of the invention at 
2g.8kbits/second; and 

Figure 14 is a graph showing the performance of the embodiments of the invention at 
"Hie method converts images (frames) of a video sequence to a coded representation. 
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The method is arranged to code the images at a desired rate, so that coded images are 

presented to an input at the required frame rate. The method operates in a first order 
coding loop where only the instantaneous first derivative of the information is coded. 
r(z) = f(2)(l-z M ) 

In other words, an image to bo coded is compared with a reference image, and 
differences between the images are coded. The efficiency of the coding loop relies on 
temporal redundancy between images in a sequence. Higher frame rates have greater 
temporal redundancy than lower rates for the same change of viewing scene, and the 
use of a first order coding loop is suited to frame rates of 2 firames/s or more. 

A basic embodiment of the method is described, followed by an extended 
embodiment of the method which includes motion compensation. 

A block diagram of the basic method is shown in figure 1. An image to be coded is 
first transformed to a discrete wavelet representation. Scalar quantisation is then used 
to remove redundant information from the input image. The quantisation is based 
upon a psychophysical model, and quantisation factors used are selected so that errors 
introduced by the quantisation are below a visibility threshold Differences between 
the (transformed) image and a predicted reference image are determined, and a 
resulting difference image is vector quantised to generate a set of codebook indices. 
The codebook indices are represented by indices of a variable length entropy 
codebook. 

Because the codebook indices are represented using a variable length entropy 
codebook, maximum objective information exploitation is achieved when the result of 
the image difference calculation produces small similar valued numbers. From a 
practical perspective this means mostly zeros and, if not zero as close to zero as 
possible. The efficiency of the method is therefore dependent to a large extent upon 
the input to the vector quantiser. The image difference calculation carried out prior to 
vector quantisation is 'encouraged* to produce small values by firstly stripping as 
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much redundant infonnaiion from the input image as possible (using the 
psychophysical quantisation step), and then secondly subtracting from it a best 
possible prediction of that input (i.e. the predicted reference image). 

The method attempts to process images in a manner similar to the human visual 
system. This allows the method to maximise the removal of subjective information 
redundancy, i.c. infomiation to which the human visual cortex does not respond. 
Such infomiation is defined by non-conscious processing of the brain such as edge 
and texture separation, frequency sensitivity and masking effects. The term non- 
conscious is used here to differentiate it from the Freudian unconscious terminology^ 

From a sample model of the human visual system, die first area of subjective 
information redundancy exists in the eye's sensor mechanism. The eye is conditioned 
to be more sensitive to luminance information than chrominance. Therefore the 
method is provided with a front end (not shown in figure 1) which represents die 
images by separate luminance (Y) and chrominance (U and V) components, the 
chrominance spatial resolution being a factor of two less than the l uminance 
resolution. The nomenclature for this colour space is represented by the ratios of the 
components as YUV 4:1:1. 

The visual cortex is responsible for non-conscious processing of visual information 
presented to it by the eye T s sensor mechanism. The visual information is separated 
too textures and edges of various orientations which are processed at differing 
resolutions. The method uses multiresolution wavelet transform because it simula tes 
the way in which non-conscious vision processing is performed The sensitivity of 
the cortex to sub-bands of the wavelet transformed image is not constant, Le. the 
cortex is less sensitive to diagonal lines that it is to horizontal and vertical lines. The 
order in which the multiresolution wavelet transform is coded reflects this, with 
horizontal and vertical detail being given priority over diagonal detail 

The wavelet transform is implemented as a discrete wavelet transform (DWT) which 

lifting technique with coefficients derived from 9-7 bionhogonal filter coefficients 
known to be suitable for image coding applications [Yillasenor ID., Belzer LUo I 
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Wavelet Filter Evaluation for Image Compression, IEEE Transactions on Imago 
Processing, Vol 4, No. 8, August 1995, pp. 1053 - 1060]. 

Any oiher suitable biorthogonal filter coefficients may be used. From an objective 
information redundancy viewpoint toe choice of biorthogonal filter coefficients is 
important when producing image decompositions with the minimum number of 
significant valued coefficients. A discussion of bow to chose appropriate 
bionbogonal filter coefficients is included in the paper by Villasenor, Belzer and Liao 
1995 [see above]. 

The discrete wavelet transform is chosen in preference to a harmonic transform, such 
as a discrete cosine transform, because it is more suited to representing continuous- 
tone images composed mostly of large smooth areas and sharp edged boundaries, as is 
commonly seen in a video coding environment Harmonic transforms are not suited 
to visual data of this sort, particularly at edges, because they produce many significant 
valued coefficients. 

Referring again to figure 1, an input image following the DWT and psychophysical 
quantisation has a stored predicted reference DWT image subtracted from it on a 
wavelet coefficient by wavelet coefficient basis. A set of sub-band vector quantisers, 
one for each resolution level, orientation and colour component, is used to quantise 
the difference coefficient image. The bit rate is partially controlled by a parameter 
that thresholds the vector coefficients before quantisation. The codebook indices are 
entropy coded and therefore the output bit rate is dependent on the operational 
parameters of the vector quantiser. The operationally optimal vector quantiser for 
each sub-band is found by selecting the most suitable operating point on a distortion- 
rate function for that sub-band. Suitability is determined from both sub-space 
coverage and a practical entropy code perspective. 

The method incorporates two cognitive factors. Firstly, in human vision objects are 
recognised as a whole before filling in detail, low resolution information being noted 
before high resolution information. For example, a person using video telephony wiH 
note a human head and shoulders before attempting to recognise an indivi dual 
Secondly, human vision tends to be foveated, i.e. the area of greatest visual sharpness 
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is that point towards which the eyes are directed. This is especially true when 
tracking a moving object For video telephony, it is most likely that a face will be 
focussed on and will usually be in the centre of the image. Therefore it is most 
profitable for coding of images to be centre biased. The method takes note of the 
above aspects of human vision by coding each difference name vector from the 
lowest resolution DWT sub-band to the highest, in a spiral arrangement of wavelet 
coefficients which extends from the centre outwards within each sub-band. As 
mentioned above, luminance components are favoured over chrominance 
components. 

Figure 2 represents schematically the order in which wavelets of a difference image 
arc vector quantised for a two level DWT decomposition, level 2 being a low level of 
detail and level 1 being a high level of detail Figure 2 is for fflustranve purposes 
only, and it should be noted that the embodiment of the method uses a 4 level DWT 
decomposition. Referring to a lower left hand comer of figure 2, the method begins 
with the lowest level of detail (box labelled LL) of the difference image Gevel 2). 
This may be thought of as a thumbnail difference image. The method begins with 
four wavelet coefficients at a central point of the thumbnail difference imag e, as 
represented by a block labelled A in the centre of the box labelled LL. The wavelet 
coefficients are vector quantised by comparing mem as a whole with a codebook 
specific to the LL sub-band, and allocating an index of the codebook to them (the 
index is a single number representative of the four wavelet coefficients). Four 
adjacent wavelet coefficients, as represented by a block labelled B are then vector 
quantised by comparing them with the codebook specific to the LL sub-band. The 
order in which wavelet coefficients are vector quantised is shown by the spiral 
arrangement of arrows in the box labelled LL. In this way, a series of codebook 
indices are generated in a sequence, the first few indices of the sequence representing 
wavelet coefficients of high importance (i.c. wavelets at or close to the centre of the 
image), and subsequent Indices representing wavelet coefficients of gradually 
<kcreasing importance (i.e wavelets at or close to edges of the image). 

Following vector w-n^'r^-. rrt-.t f.«-,v f ? f ' • -' , f ? £" ■ ... pr\ j. . . - 
uxlc^uq^ i^u^u^a. oi teuiiumai elements of the difference image (LH) is 
vector quantised The vector quantisation is carried out for blocks of four wavelet 
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coefficients, and is arranged in a spiral. Following vector quantisation of the LH 
difference image, level 2 information representative of vertical elements of the 

di<T , r^«im?'5e 5s vector quantised in a similar manner. . 

Level 1 information representative of horizontal dements of the difference image 
(LH) is vector quantised in the same maimer as described above (for blocks of four 
wavelet coefficients). The level 1 horizontal difference image (LH) is vector 
quantised before the level 2 diagonal difference image (HH), because the level 1 
horizontal difference image information is considered to be more important from a 
psychophysical perspective (the human cortex is not as responsive to diagonal lines as 
it is to either horizontal or vertical). 

Level l information representative of vertical elements of the difference image (HL) 
is vector quantised, followed by level 2 information representative of diagonal 
elements of the difference image (HH), followed by level I information representative 
of diagonal elements of the difference image (HH). 

It should be noted that vector quantisation of the level 2 thumbnail difference image 
(LL) and each subsequent image is carried out first for wavelets representative of the 
iuminance (Y) of the image, and then for wavelets representative of the two 
chrominance components (U and V) of the image. 

The indices generated by the vector quantisation arc coded in real time using variable 
length entropy coding, as represented in figure 1 (the method does not include a 
buffer). Coded information representative of the image begins with the most 
uaportant information relating to the image, and decreases gradually to the least 
important information relating to the image. 

The order in which coded information relating to an image is generated by the method 
is as follows: wavelet coefficients of a luminance (Y) representation of a level 4 
thumbnail difference image (LL) starting at the centre of the thumbnail image and 
spiralling outwards (this is done for every difference image), wavelet coefficients of a 
fijst chrominance (U) representation of the level 4 thumbnail difference image (LL), 
wavelet coefficients of a second chrominance (V) representation of the level 4 
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thumbnail difference image (IX) (all difference images are represented separately as 
Y,U and V wavelet coefficients), level 4 horizontal information (LH) difference 
image, level 4 vertical information (HL) difference image, level 3 horizontal 
information (LH) difference image, level 3 vertical (HL) difference image, level 4 
diagonal information (HH) difference image, level 2 horizontal information (LH) 
difference image, etc. 

The majority of the wavelet coefficients for a given image will be zeros. This is 
especially true when differences between image flames are small, since the wavelet 
coefficients represent a difference image. Referring to figure 2, a block (i.c vector) 
of four zero wavelet coefficients is teamed a zero vector and is allocated an indexpf 
zero. Coding of vectors proceeds in sequence, counting the zero vectors until a non- 
zero vector is encountered (to is referred to as a zero run). The run of zero vectors 
and a codebook index representing the non-zero vector are entropy coded with 
respective codes. Thus, a difference image containing many zero vectors will be 
entropy coded with a small number of bits. 

Where the method is used in a communications application, the importance of coded 
information arriving at a receiver will decrease gradually over tune for a given image. 
This feature of the method provides a primary advantage, namely that the amount of 
information sent to a receiver can be adjusted to precisely match the bandwidth of a 
communications link For a given bandwidth, coded information representative of an 
image which fills mat bandwidth is transmitted across the communications link, and 
any further information is discarded. 

If the number of bits generated by an entropy coded pair when added to the current 
naming total of bits is less than the frame bit limit, then the two entropy codes are 
added to the bit stream. This process continues until the frame bit limit tsreached,at 
which point no further bits are transmitted to a receiver. The bit limit will in general 
occur partway through coded information relating to a DWT difference image of a 
detail level (i.e. referring to figure 2, partway through a spiral). 



includes the inverse of the variable length encoder and the feedback loop of figure I. 
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The image is reconstructed in the same order in which it was coded, ic beginning 
wiib the most important information and adding detail of gradually decreasing 

\tnpc!t*nr?., "T>^ where ^ C0Tt:T:r:r ; c ^v r5 ^ u v tV i 

tevei ox detail of the reconstructed image will be correspondingly low but the 
psycho visually most important parts of the image will be present in the reconstructed 
image. All of the information received at the decoder, including code immediately 
preceding the bit limit for a given image (i.e. referring to figure 2, part of a spiral) is' 
ujed by the decoder to reconstruct the image. 

All vectors greater than the bit limit arc assumed to be zero by the decoder. 

The method is particularly applicable to communications links in which the 
bandwidth varies continually and over a wide range, for example the internet The 
entire bandwidth available at any given time (or a selected portion of the bandwidth) 
is used to transmit coded image information. Known prior art coding methods are not 
capable of being used in this way. 

*End-of-sub-band' and 'end-of-image' markers are included for bit stream 
synchronisation. 'End-of-sub-band' markers allow the receiver to partially decode the 
bit stream such that a 'thumbnail* representation (lower resolution) of the image may 
be viewed during reception. This is particularly useful for monitoring 
communications sessions or scanning stored video databases. 

Artefects may be generated by the coding method as a result of producing coded 
vectors of variable length, per frame in the ordered sequence. During periods of low 
temporal activity the ordered vector sequence win reach high resolution sub-bands of 
the wavelet representation without exceeding the number of bits allowed to code that 
frame (the number of bits may for example be determined by the bandwidth of the 
communication link). If this is followed by a burst of temporal activity, then the 
ordered sequence of coded information will reach only low resolution wavelet sub- 
bands (for a communication link of the same bandwidth), and will implicitly code 
zero change in higher wavelet sub-bands. The visual effect of the temporal activity 
will be to leave motionless high frequency edge components superimposed on die 
moving image. This artefcet is referred to as the 'shower door 4 effect To alleviate 
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the effect, the high resolution sub-band vectors in the decoded image 
by dCC ° dCr ^ Centre out - ^ decoder will set to zero any vectors that are 
not coded for a current image and which were coded in a preceding image. Thus, the 
vectors are set to zero for mcreasing resolutions from the centre of the image 
outwards, until the foveated point reached by the last coded sub-band of the preceding 
image. To allow fo r differing vector dimensions in each sub-band Cue. referring to 
figure 2, different sizes of blocks), the foveated point is calculated as a fraction of the 
sub-band ordered sequence. 

The extended embodiment of the method includes motion compensation. Motion 
compensation is used to reduce the amount of information in the difference image for 
more efficient coding. The motion compensation tracks spatial translational motion 
between images in the video sequence. DWT domain motion compensation has some 
advantages over the spatial domain motion compensation (for example as used by the 
H263 algorithm), in terms of the visual artefacts generated in those cases where the 
approximation error is large. Spatial domain motion compensation produces 
annoying blocking effects, whereas less objectionable speckled noise is produced 
from DWT domain compensation. 

A block diagram of the extended method is shown m figure 4. The basic principle of 
the DWT domain motion estimation and compensation proceeds in the following 
manner. After the subjective information has been removed from the current DWT 
domain image (psychophysical quantisation), motion is estimated with each two- 
dimensional block of spatially corresponding coefficients within each sub-band and 
colour component (chrominance estimation is optional) of a stored reference image 
Withm each sub-band the block dimension (i.e. thenumber of wavelet coefficients to 
which vector quantisation is applied to generate a single index) is chosen to be me 
same as the block dimension of the corresponding vector quantiser (see figure 2), to 
allow the foveated sequence to proceed on a block-by-block basis without overlap 
Extending the sub-band boundaries with zero valued coefficients permits morion 
vectors from outside the sub-band. This increases the estimation accuracy f 0r the 



(MSB) matching block is written into the referenco image to provide d motion 
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compensated reference image. The compensated reference image is then subtracted 
from the input image and is vector quantised. The estimated motion vectors are coded 

directly using variable lac^th codm» j»s shows & figure 4. 

Examples of block based motion estimation and compensation which could be used 
are as follows: 

1. Apply the inverse DWT within the coding loop and motion estimate and 
compensate in the spatial domain. This approach ensures the maximum motion 
accuracy but increases the coding complexity [Nosratinia A., Orchard M.T. A 
Multi-Resolution Framework for Backward Motion Compensation. SPIE 
Proceedings on Digital Video Compression; Algorithms and Technologies, Vol. 
2419, 1995. pp. 190 - 200]. 

2. Motion estimate and compensate within the DWT domain and accept the accuracy 
limitations at each level [Zhang Y.Q., Zafar S. Motion-Compensated .Wavelet 
Transform Coding for Color Video Compression. IEEE Transactions on Circuits 
and Systems for Video Technology, Vol. 2, No. 3, September 1992, pp. 285 - 
296], If the MSE of a motion compensated block is greater than that of some 
weighted value of the pixel energy then, no motion vector is coded [Mandal MJL, - 
Panchanathan S. Motion Estimation Techniques for a Wavelet-based Video Coder. 
SPIE Proceedings on Digital Video Compression; Algorithms and Technologies, 
VoL 2668, 1996, pp. 122 - 128]. This limits the estimate inaccuracy from 
increasing the bit rate. 

In both approaches advantage may be taken of the multiresolution structure of the 
DWT for inter level prediction to reduce the general multiresolution redundancy in 
natural images -and reduce the coding complexity. However the partial inverse DWT 
may be required for the estimation process because there is no direct evidence that the 
coefficients at lower level sub-bands may be predicted from the higher levels without 
using the reconstructed LxLy sub-bands. 



There is generally little motion between consecutive frames in a video sequence 
particularly in the background. Therefore aero motion vectors will be statistically 
most likely. The motion vector foveated sequence is coded in a similar manner to the 
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«ro run length encoding of vectors applied to the indices of the vector quantise* 
For each non-zero mot ion vector in the sequence, a zero run length and the two 

provided the fiame bit limit is not exceeded. 

aero run length encoding of the motion vectors is particularly importam for very 
W b« rare video coding algorithms where an added bit overhead could ofiset the 
quality gamed by motion compensation. For small movement within video scenes, as 
u possible in video telephony, many zero and small valued motion vectors will 
consumescarcebitresohrc.es. An added problem with block motion estimation is that 
it ta possible for the compensated block to produce greater difference image energy 
than without compensation (i.e. more information is required to be coded) An 
uncompensated reference image block is equivalent to a compensated block with a 
zero motion vector. Therefore to improve the predicted reference image and to 
'encourage' the zero motion vector for coding efficiency, a 'best choice* algorithm is 
used. 

The 'best choice- algorithm is implemented in a block mean-square error (MSB) " 
sense. The basis of the choice is determined by vector quantising the difference 
image blocks from both the motion compensated and uncompensated images and 
chooang the vector quantised block with the lowest quantisation error. If the 
•^compensated block is chosen, men the zero vector is associated with it The process 
• is ^grammatically illustrated in figure 5. 

The decoder does not require any knowledge of the choice between the 
uncompensated block and compensated block, since the zero vector implicitly refers 
to an uncompensated block. 

The 'best choice' -algorithmwith run lengm codmg of me z^ motion vectors enures 
a munmal bit cost for both low temporal activity and high temporal activity. 

Several parts of the method are described in m0T r b „ ?0Wj foV ^< hy 
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Psychophysical quantisation of an image, as represented in figures 1 and 4 is 
described in the following paragraphs. The psychophysical quantisation uses scalar 

quantisation. As mentioned °hov% the ^o?I of ~rysyrrtyr. r «\ :, •-, 

iiiuovc ^w^i i^^j subjective information redundancy. The degree to which 
spatial frequency representations of images may be shaped depends on the 
frequencies observed by a human viewer. 

"The two-dimensional DWT used here by the method consists of a frequency and 
orientation sub-division process. The multiresolution filtering and sub-sampling in the 
horizontal and vertical directions divides the signal into octave frequency sub-bands 
with horizontal, vertical and diagonal orientations. The DWT process may therefore 
be considered as a discrete approximation to the physiological process that takes place 
in die model for the human visual cortex [Mallat S.G. Multi/requency Channel 
Decompositions of Images and Wavelet Models. IEEE Transactions on Acoustics, 
Speech, and Signal Processing, VoL 37, No. 12, December 1989, pp. 2091 -2110]. 

The embodiment of the method uses an efficient quantisation strategy developed to 
take advantage of the modelled physiological processing of the human visual cortex. 
The visibility of quantisation errors for all colour components at all DWT levels and 
orientations has been determined from a set of subjective experiments [Watson A.B., 
Yang G.Y., Solomon J. A, ViHasenor J. Visual Thresholds for Wavelet Quantization 
Error. SPIE Proceedings on Human Vision and Electronic Imaging, B. Rogowitz and 
J. Allebach, Editors., VoL 2657, Paper No. 44, 1996]. Using this information, a 
psychophysical model customised for the two-dimensional DWT coefficients is 
established. The resulting fitted model for the threshold of visibility, T, as a function 
of g*rial frequency, f, and orientation, o, is as follows: 

T(f,o)=» s .i<A »*J 



The approximated model parameters are given in Table 1 . 
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Colour 


a 


k 






gUH? 


gHxLy 


giuay 


Y 


0.495 


0.466 


0.401 


1.501 


1.0 


1.0 


0.534 


V 


1.633 


0353 - 


0.209 


1.520 


1.0 


1.0 


0.502 


V 


0.944 


0.521 


0.404 


1.868 


1.0 


1.0 


0.516 



The successive application of the D WT at each level, 1. results in halving of the sub- 
band frequency bandwidth. For a display with a maximum spatial frequency of f^: 
BW l -i, BX 2-' [cpd] 

where cpd indicates cycles per degree of the viewer's vision (this depends upon the 
si* of the decoded image and the distance of the viewer from the decoded tmage). 
The centre frequency of each sub-band is used as me nominal frequency for the 
development of the quantisation process. 
f,^X m 2 HM) [cpd] 

The visibility of quantisation errors introduced at a particular level and orientation 
be approximated by the amplitude of the DWT synthesis filters for that level and 
orientation. This approximation is implementation and filter bank- dependent The 
induction of quantisation errors at level (ntfl) for an inverse DWT process is 
shown in figure 6. 

Each sub-band is up-sampled by two in the vertical direction, then a convolution is 
performed with the relevant onedimensional synthesis filter, multiplied by a factor of 
two and summed to form vertical groups. The vertical groups are similarly up- 
sampled followed by the convolution, multiplication by two and summed, but In the 
horizontal direction- The resulting effect on the image at levei'm' is then propagated 
through the remaining image levels to the reconstructed image. .Considering only the 
most significant term of the linear convolution process approximates the amplitude of 
the error per leveL For example, the effect of an error in a low-pass horizontal and a 
high-pass vertical orientation (LxHy) at one level may be approximated as follows, 
f,-t\ .:; [-■■:'.. 



* 
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Therefore the effect of an error at level m for each orientation, on the entire 
reconstructed image may be wriitcn as: 

E Bj ^»2 ,B, | e na Jt H y 8oM? m * ,> |» 
and, 

A biorthcgonal wavelet filter bank that performs well for image compression 
applications, is the spline-based set with filter coefficient lengths of nine and seven 
[Vfflasenor et al, see above]. The approximate amplitudes of error visibility for this 
filter bank with a root-two bias as required for the inverse D WT described above, to 
four levels and all orientations of the DWT process ate given in Table 2. 



Orientation 


1 


2 


3 


4 


LxLy 


1J2430 


1.5461 


15224 


2.3904 


LxHy 


13447 


1.6720 


2.0790 


2.5851 


HxLy 


1.3447 


1.6720 


2.0790 


2.5851 


HxHy 


1.4542 


1.8082 


2.2483 


2.7956 



A quantisation factor is required for each colour and sub-band such that the resulting 
quantisation error is below the visibility threshold. For a linear quantiser with a factor 
of Q, foe worst case error is Q/2. Therefore the quantisation strategy used by the 
method for the psychophysical scalar quantisation is: 



I(f,o)«V(lc)^ 
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The quantisation visibility term, V is defined by die DWT process such as dial given 
in Table 2. The operational quantisation factors used by the method are formed as 
follows: 

Q(1 - o) -v(b T(f ' o) 

where Vfl, o) refers ro the level and orientation error visibility values set out in table 

2. 

The quantisation factors provide an overall shape that may be applied to the DWT 
coefficients of an image to achieve an imperceptible difference with the original 
image. For low bit rate applications where greater quality loss is tolerated, the 
quantisation shape is uniformly scaled to ensure that the largest errors are confined to 
the least responsive regions of the human psychophysical system. 

Referring to figure 1 and figure 4, psychophysical scalar quantisation (as described 
above) is followed by the generation of a difference image. Vector quantisation is 
then applied to the difference image. The following paragraphs describe the vector 
quantisation of the difference image. The method uses a self organising neural map 
(SOM) which through foe use of training images provides a solution to foe problem of 
how to apply vector quantisation efficiently. 

The generation of a vector quantiser solution requires fonnulaling a cost function to 
be minimised and describing the conditions of optimality for that cost function. 
General vector quantisation theory is discussed herein to lay a foundation for showing 
how foe trained SOM used by the method achieves the same optimal solution. A 
noise model of the vector quantiser produces a similar gradient descent training 
?l5ori#r> f ? frsi of the SOM. Operationally optimal vector quantisers for a signal 
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entropy is approximated as a rate-constraint and results in a natural extension to the 
SOM training algorithm. 



A basic encoder-decoder model for a vector quantiser is shown in figure 7 [Gersho A., 
^ * M - Vector Quantiation and Signal CommggnW Rluwer Academic 
Publishers, Boston, 1992]. The processes discussed in the prior art all assume a high 
resolution case where the number of codewords, N, is very large i.e. N-xo. 

The vector quantisation problem may be considered as an optimal approximation 
process of an input space to an output space that, in the case of signal compression, is 
itself a subset of the input space. A vector, x, of dimension k from the input space, X* 
(where X e R, me set of real numbers) is the input to the encoder. The encoder maps 
the input space to a codebook C that consists of N codewords. The encoding process 
may be written as: E: X*-*: where 0{c{x)}i a i • The process is fully defined by 
either the index into the codebook, irf, or the codeword itself; q(x), and therefore it ia 
usual for only the index, or a symbolic representation thereof; to be transmitted on the 
communications channel to the decoder. The encoding process is the quantisation 
process that, in general, is lossy in that the codebook size is limited, |C| - max{i) =N 
« ». The decoder maps the codebook back into the input space and may be written 
as: Z>: C ->■ X k where the reconstructed vector ye X* . 

The compression mechanism of vector quantisation is achieved by the dimension 
reduction process from a vector in space X k to an integer index, i. The premises for 
the mechanism is that the signal space covered by x is a sub^ace of X* and thai it is 
a stationary random process with an underlying joint probability density function 
(pdf), f(x), such that «x)->0 as x-*bo defined as {x,-*±» , Xr ->±co xr*t*>). 

Generating optimal or near optimal vector quantisers for the signal sub-space is 
achieved by minimising a suitable cost function in a long term average sense where 
the cost function is itself defined as a stationary random process. If the cost function 
and the sub-space pdf are smooth functions, or may be approximated by smooth 
functions, and hence are differentiable everywhere, then gradient descent methods 
may be used to find near optimal solutions for sometimes intractable analytical 
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solutions. The most common approach is to minimise the mean squared error (MSB) 
distortion in a Euclidean distance sense, because the performance of the resulting 
vector quantiser is usually measured by the MSB criterion. The function to minimise 
may be written as: 

D= J)x-y| ! f(x)<k x,y e R k 



Here the || • || operator represents the Euclidean distance. The optimal solution to the 
tmtumisation t>f D with respect to y, requires the joint ininimisarion of the nearest- 
neighbour condition and the cenzroid condition. 

■» 

The nearest-neighbour condition describes the optimal encoder given a fixed decoder. 
This condition results in the input space being partitioned into regions, R,, which may 
be termed as k-dimensional 'Volumes of influence" of the fixed codewords, Ci « y h in 
the codebook. C. The optimal region partitions are such that: 
^cCxjJx-CifsJx^Cjl 1 } j = i...N 

Therefore: 

|*-y|' S mui0x-c i |'} 

The region is chosen to nmtimisc the squared error distortion with the given 
codebook. 

The cenxroid condition describes the optimal decoder given a fixed encoder. The 
distortion integral, D, may be rewritten as: 

»&J)x-c f | , f(x|x e R 1 )dx 
Here P, is the probability that x is in R.% P, = Prob[x«R0 and n>|x«R0 is the 
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regions, Ri, are fixed and therefore each conditional term may be separately 
nunumsed, provided that P f is non-zero. Therefore the centroid of region, is 
denned as that output vector, y, = q , which minimises fr* ^c-?rn b >~v,-,' ;,' ,- f 
—J j^jui ifccfctf, ^ where xsi^, over the entire conditional pdf. 



y, =min 



-i 



Jfcx-y| l f<x|xeR,)<nt 



Under the squared error distortion criterion the optimal solution is the centroid of each 
region 

y,-Jxf(x|xeR,)dx 



An iterative batch mode algorithm may be used to find an optimal or near-optimal 
solution to these two conditions for a given input distribution. The process-involves 
finding an optimal region partition for a given codebook and then finding the optimal 
codebook for the given partitions. Many such algorithms and operational derivatives 
exist for this process [Linde Y., Buzo A., Gray R.M. AnAlgcrhhn for Vector 
Quantizer Design. IEEE Transactions on Communications, Vol. 28, January 1980, pp. 
84-95; Gresho el ol, sec above]. 

A vector quantiser noise model is now described. Consider the gradient descent 
training process of an optimal high resolution vector quantiser. During the early 
stages of the training process there is a large error between the input, x, and output, y. 
As the process continues and the global minimum is approached, the error decays to 
some small value, which, in the high resolution case, may be made arbitrarily small If 
the error is assumed to be independent of the input, then it may be modelled by a 
smooth zero-mean Gaussian distributed random variable. The model in figure 7 may 
be modified to produce the noise model of a vector quantiser (L uttren SJ , SeJf . 
organization: A derivation from first principle of a class of learning algorithms. IEEE 
Conference on Neural Networks, Washington, DC. 1989, pp. 495 - 498] shown in 
figure 8* 
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For This model, the optimal vector quantiser for vectors, x. taken from a sample space 
denned by the underlying pd* fi», in a squared error sense, is one that minimises the 
long term average distortion defined as: 



' = 7 [ / I x " y<«M + «)f~ *(») dnjf (x)dx 



Here *(o) is the pdf of the additive noise. The optimal encoder for a given decoder 
then minimi ses the partial distortion measure: 



For a given input vector, x, find the minimum distortion codeword, c(x), for all 
possible noise additions. The realisation of this equation may be simplified by 
assuming that n(u) is smooth and the training process is nearing completion, son ->0 
and therefore n(n) -> 1. The minim um condition then reduces to the nearest- 
neighbour condition of the previous noiseless model. The best-matching codeword, 
c(x), from the vector quantiser codebook, C, may be completely defined by its index, i ' 
e {1...N}, in $he codebook (and vice versa). 

*x)-nnV , {jx-y(e J (x)+n)f } j«L..N 

The optimal decoder for a given encoder is found by minnmsing the distortion 
measure with respect to the output, y. 

™~2j[(x-y)*(,i)]f(x)dx 



Setting to zero and solving results in the centroid condition that may be used in 
iterative batch mode algorithm. 

Jx*<n)f(x}dx • 

y~=h 

J*t»f(x)dx 
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However, a gradient descent algorithm for the decoder output vectors follows directly. 
Note that y is a function of c, and n - c - c(x). .Therefore, randomly sampling die 
input space that has a distribution defined by the pdf, f(x), results in the following 

y M (c j ) = y t (c J )+^(c j -c i (x t ))[x l -y^cj)], j = l...N 

Here tj, is the adaptive learning rate parameter and c*(xO is defined by the best- 
matching codeword from the nearest-neighbour condition at iteration L If a zero- 
mean Gaussian function for n(n) is imposed on the learning process, then the 
codewords, c,, centred on the best-matching codeword, Ci(xO» will be 'brought nearer' 
in a squared error sense, to the centre codeword. Furthermore, by imposing an initially 
large radius Gaussian function and then decreasing it to zero during the training 
process, will result in a topological^ ordered codebook, C. 

The stochastic gradient descent algorithm for the encoder-decoder noise model is the 
same as the standard SOM algorithm with neuron indices defined by i = {1...N} and 
die neuron weights defined by yt> The noise shape defines the neighbourhood function 
and ensures the topological^ ordered property of the output space. Therefore the 
SOM algorithm will generate an optimal (or at least, near optimal) vector quantiser. 

A raie-consnained self-organising neural map (SOM) is now described. Consider the 
application of a SOM trained as an optimal vector quantiser in a signal compression 
environment The vector samples, x, are extracted from die signal and the index of the 
neuron whose weight vector has the lowest squared error distortion with x, is 
transmitted on the channel to the receiver. The receiver decodes the output vector, y, 
in a weight vector look-up table whose neuron index, i, is that received from the • 
channel. Therefore the information conveyed over the communication c hanne l is 
completely contained in the index. This raises the issue of efficient symbolic 
representation of the transmitted indices. Since only binary alphabet symbols are 
considered here, A 6 {0, l), the index is represented by a variable length code, v(i), 
whose average bit rate is upper-bounded by the uniform neuron firing distribution 
case of B » log2N bits per vector The bit rate or bit length of the code will be denoted 
by its magnitude, |v(i)|. 
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For any joint pdf ff,) of the input vector, * with dimension k, such that as 
X->±». there exists an arbitrary low distortion SOM where N is finite and the neuron 
Snug probabilities are described by a probability mass function (pmf). p(i(x)>. ^ 
p(Kx)) * f(x). This premise is based on the density marching properties of the 
tnuned SOM fKohonen T. Seif^^ny ^ Springer-Verlag, New York, 2«- 
Edmon, 1997; Haykin S. Neural Networks A Comprehensive Foundation. Macmillan 
College Publishing Company. New York, 1994]. An entropy coding method is 
therefore more efficient for transmitting the indices. From an information viewpoint 
the average entropy, in bits, is defined as: 

H(0»-|;P t log 1 P i 



Here P f is the a posteriori probability of index, i being the winning neuron .which is 
identical to the vector, quantiser definition, P, - Prob[x 6 RJ. If a prefix-free variable 
length binary code is used and allowing non-integer code lengths, then the average 
length is the index entropy fGrcsho et. al; sec above]. Therefore the length of the code 
to represent the index, i, is defined as: 
|vfl)| = -Iog 2 P t 

Note that practical entropy codes must posses integer multiple lengths and therefore 
this acts as an asymptotic lower bound. The long torn average bit rate generated from 
the trained SOM is written as; 

N 

B -2>iiv<i)i 



The rate^nstraint is constructed from the trade-off between rate, (required for 
compression), and distortion (required for image quality). For low bit rate coding we 
wish to sacrifice image quality for higher compression, but a quantitative analysis in 
the grven coding environment, is required in order to make a prudent choice of an 
operauonal point An approximate operational distortion-rate function, for a fixed 
•vector dimension and t five vrctc"- cv^v- m m ^ y bp c _^. ; , f . 

i- w«4 ^ Uicuiy Hmerfyiag enaopy<onstrained vector 
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quantisation fChou P-A_, Lookabaugh T., Gray KM. En^Co^tmined Vector 
Qua****™. IEEE Transaction on Acoustics, Speech, and Signal Processing Vol 
37, January 1983, pp. 3 1 - 42]. 



Here E[ ] « the statistical expected operator of a stationary random process This 
equation defines the bounds of the region in the distortion-rate plane within which we ' 
are able to operate with the parameter restrictions of our particular coding- 
environment * 



Consider the trained SOM where the neighbourhood function has decayed to zero and 
hence, assuming the high resolution case, the noise pdf, *(n) is approximated by the 
Dhac function. The operational average distortion. D* is a minimum in the sense that 
it minimises the Auction: 

D-Efji-yf Ja J(|x-yff(x)dx 



The nearest-neighbor and centrold conditions result in an optimal partitioning of the 
SOM weight space into the regions, R,. The rate constraint. B which is approximated 
by H© and introduced with a Lagrange multiplier, X, to generate the operational 
distortion-rate function to be mi'mrmc^ 

D *< R > »I / (jw, J 1 -ih* P,)f(x |x € R,)dx 



The integral for the noise pdf, *(»). has been omitted for simplicity of representation 
but is required for the training process as the neighbourhood function. Note that the 
Uog* term is a constant and therefore affects the ntar^netghbour condition but 
does not affect the cenxroid condition. The nte-constrained SOM is trained with the 
gradient decent algorithm to n*umise the instantaneous cost function described as: 
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A^con^edSOMaJgonthmisnowd^ed. For *e h^ process of the 
SOMs, that are used as vector quantisers, to be considered meaningful, an appropriate 
***** set must be formed. The training set must represent the input space* the 
sense that us ptf is an approximation to pdf of the input space, which will be appfied 
to theater quantiser during normal operation, m a video coding envhonment where 
the Terence images are quantised witbin a first order coding loop, the quantisation 
cnor w.Hbepropagated to the reference in^e. Tte ermr w^ be supetunp'osed on Ute 
next differs image and will appear as an input to the vector quantiser. In this way 
quanusanon errors will accumulate in time as subjectively annoying artefacts This 
effect rs exaggerated in low bit rate codecs where the number of neurons in the SOMs 
* restneted and the high resolution assumptions are not strictly valid. The problem 
may be reduced by dynamically adapting the trainings during the training process. 

The 2 ero vectors are generally coded via some other mechanism (either a tun-length 
approach or using a code/no code bit) and hence the training set consists only of non- 

r ****** Vea0IS - ne **** * * *rmed from typical image differences and 
the addmon of errors that the vector quantisers wul mak, afier training However 
taowledge of the final state of the quantisers is unknown during the training process,' 
themfore error approximation is based on the current state. The applied training set is 

"ndom samples, x, and * are selected from the dir^ce image naining set. T at 
iteration t * 

{*i(0,x,(t)}«T 

Mi *~m of *. SOM is PCW, « fc te ^ 

the effect of this error in the codme loon , . . 

theSOM-n, , SeC ° nd tramiDg *• for 

cesenbedsr; 

T i *{*i(0.».(0} 
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Here; 

*«(t) = xj<t)-(x,'(t)-c,(t)) 

He training process continues in this way with a difference image sample and an 
error adapted sample at each iteration. After initialising the SOM neuron weights with 
small random values and setting the occurrence frequency, Fj to unity for each neuron, 
the training proceeds by repeating the steps 1 to 5 defined as follows; 

SSSsl: Randomly sample the naming space to establish .{xrft), ^(t)}. 
Stgp_2 : Determine the first wmning neuron, i,: 

* N - 

SteU: Update the neighbourhood neuron weight values and the winning neuron 
occurrence frequency: 

*;(t + 1) + 9(1)^ (t)[x, (t) - Cj (t)] , j - 1_N 
Wl 

Here tft) is the exponential decaying learning rate parameter and ^,(0 is the 
Guassian exponential neighbourhood function with a linearly decaying radius centred 
on the wnning neuron, i|. 

SlSBi: Determine x^t) and find second winning neuron, i 2 ; 
j(x f (t)-c J (t)| 1 - / tlog J P j ), j-i,. N 

filSLi: Update the second winning neuron weight values and occurrence frequency 
tib+^'iW+M^MtxM-W], Smut 
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The heuristics of constraint parameter selection are now described For a DWT- 
domain video coding environment each DWT sub-band exhibits differing statistical 
properties and probability distributions therefore optimal rare-distortion selections are 
based on different criteria depending on the sub-band. The sub-band multidimensional 
probability distributions can not be assumed to be Gaussian nor are infini te resolution 
approximations necessarily applicable to finite size SOMs. The training set is a 
sample collection of the input space and therefore the underlying pdf is not smooth. 

In practical implementations the choice of sub-band vector dimension is limited by 
the image size. For example, QOF image sizes permit vector dimensions of 4x4, 2x2, 
2x2 and 1x3 at sub-band levels of 1, 2, 3 and 4, respectively. Furthermore, the SOM 
dimensions are restricted by the need for practical variable length codes. SOM sizes 
of 8x8, 16x16. 32x32 and 64x64 neurons arc considered practical. Operational rate- 
distortion points verses X plots are generated for all sub-bands and colour components 
and used to empirically select generic optimal points for ^constructing the vector 
quantisers. 

Consider the results for the luminance, level 3 and LxHy DWT sub-band at four 
different SOM sizes for a 2x2 dimensional vector, shown in figure 9. 

The overall trendline shows the distortion-rate characteristic for the choice of SOM 
size. The characteristic could be described as a cost curve where the operating point is 
chosen depending on slope at that point A. low valued slope (> 1.75 bpp in figure 9) 
implies a small distortion cost per coding bit The large negative slope (< 1.75 bpp) 
region implies a large bit cost for an improved distortion. From this line, the 32x32 
SOM may be chosen as me most optimal under these operating parameters. However, 
for low bit rate coding the 16x16 SOM will give a gain of *0.5 bpp for an average 
loss of *1.0 pixel mean square error (MSE). In this way the operating point is chosen 
depending on the coding environment 

t 

The; ftirmrr treiidVr~r ir J , * ; '°s*'< f*r • J*j*l rc r-, " , ~ !-'!! f- . c<-«.- - 

LO»i Us tLe typical wu-mciwsing convex shape of high resolution smooth pdf 
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vector quantisers. However, as the number of neurons of the SOM is decreased and 
therefore the further away from the high resolution assumptions the vector quantiser 
operates, two phenomena begin to appear. A clear operational minimum distortion 

For the low resolution points it is conceivable that these phenomena may be attributed 
to a diminishing difference between the global minimum and the local minima for the 
now large 'volumes of influence' of each neuron in the SOM 

The selected 32x32 SOM shows that the locally optimal operating point is ar a 
minimum . Each sub-band is analysed in the same way to produce an operational SOM 
size and Lagrange multiplier, X. 

Two video test sequences have been used to evaluate the basic embodiment of the 
method and the extended embodiment of die method. The test sequences are typical 
for video telephony environments, and are of the same subject at differing distances 
from the camera. The training difference images for the SOMs were not constructed 
from any of die test sequences and were taken from a different camera. The first 
images of the first test sequence is shown in figure 10, and the first image of the 
second test sequence is shown in figure 1 1. 

The test sequences consist of colour images with 24 bits/pixel (bpp) (being 8 bits for 
red, green and blue components respectively) and with a QCIF pixel resolution (176 x 
144 pixels). The frame rate is a constant 10 frames/s. For the purposes of 
comparison, the measure of output quality is considered from a peak signal to noise 
ratio (PSNR) perspective that is defined from the mean square error (MSE) between 
die original and Jhe output images on a pixel-by-pixel basis for all colour components. 
For an input image, h> and a reconstructed output image, I* with pixel dimensions 
MxN and C colour components, the MSE is defined as: 

c=J Ms) %*\ 



The PSNR is therefore defined as: 
PSNR=101og||l [dB] 
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For bit rate control a quality factor parameter was used to scale the DWT 
psychophysical model. An arbitrary quality factor range of q = {0...31} was chosen 
for the psychophysical scaling, S, applied as a division factor for quantisation and a 
multiplier for inverse quantisation 

S = H-* 
4 

A threshold factor was applied to the difference image coefficients. Tnethresholding 
was applied in the following maimer. If the absolute value of the coefficient was less 
than the threshold value then, it was set to zoo. Otherwise, the threshold was 
subtracted from (added to) the positive (negative) coefficient The purpVae of the 
subtraction (addition) was to further 'encourage* small valued coefficient vectors. 

The algorithm was applied with target bit rates of 10k bits/s, 28.8k bits/s and 64k 
bits/s with the quality and threshold factors set according to Table 3. * 



Test Sequence 


Bit Rate (blts/s) 


Quality Factor 


Threshold 


first Sequence 


10k 


3 


2 




28.8k 


2 


2 




64k 


1 


2 


Second Sequence 


10k 




2 




28.8k 


1 


2 




64k 


1 


1 



A section of the distortion results from image frame numbers 200 to 300 for the two 
sequences, comparing the basic method and the extended motion compensated 
method, for 10k bits/s, 28.8k bits/s and 64k bits/s are shown in figures 12, 13 and 14 
respectively. The images were decomposed to 4 levels with foe DWT but the morion 
compensation was performed only for levels 1 to 3. 



c l" 



roans as the bit rate is decreased from 64k bits/s down to lOkbits/s. The 64k bits/s 
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case provides =»1 .5 dB gain, the 28.8k bits/a case *0.5 dB and approximately the same - 
performance at 10k bits/s. The DWT domain motion estimation is more accurate in 
the high resolution sub-bands than at the low. At very low bit rates the constant frame 

' J !V; V-h..t V/. I: .-.V. -.oh.-:. • 
will not reach ihe higher resolutions. Therefore the contribution of the motion 
compensation to the gain becomes limited and considering that it only begins at DWT 
level 3, This is more apparent for scenes with higher temporal activity as in the first 
sequence where the basic method actually performs better. Here, the bit cost of 
coding the motion vectors outweighs their potential quality gain, although the 
difference is small. 

Note that the sub-band vector quantisers are trained on and hence optimised for 
difference images that exclude motion compensation. Including motion compensated 
data in the training process should improve the performance of the extended method 

The effect of die constant frame bit constraint is apparent in the shape of the distortion 
shown in figures 12 to 14, and is consistent at all bit rates. Any sudden temporal 
activity in the sequence results in a large negative slope of PSNR (see for example ' 
frames 230 to 235 of the second sequence). There is more energy in the difference 
information that requires coding, and this means that either more bits must be used 
mi/or the quality must decrease (distomon-rate trade-off). The constant bit 
constraint means that the quality is sacrificed. If the sudden temporal activity is 
followed by a period of low activity, the method will use this 'bit respite' to recover 
the quality of the coded image. This is indicated by positive PSNR slope following 
the decrease (see for example frames 236 to 250 of the second sequence). Note that 
the positive recovery slope is less than the negative quality degradation slope. 
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Claims 



1. An image coding method comprising generating an ordered sequence of coded 
image data, the sequence beginning with coded data representative of an area'of the 
image having high importance, and ending with coded daia representative of an area 
of the image having lower importance. 

2. An image coding method according to claim 1 , wherein the importance of the 
image areas represented by the coded data decreases gradually over the ordered 
sequence. 

3. An image coding method according to clairp 2, wherein die image data coding 

9 

sequence is arranged in a substantially spiral /configuration centred on the area of 



importance. / 



4. An image coding method according to any preceding claim, wherein the area 
of importance is at a location seated as the most likely centre point of foveaied 
vision of a viewer of the image^/ 

J 



5. An image coding jaiethod according to claim 4, wherein the area of importance 
is at a centre pointof^he image. 

6. An image coding method according to any preceding claim, wherein the 
method includes converting an image into a multi-resolution representation, different 
resolution representations of the image being coded in sequence, the order of the 
sequence being determined to reflect psychophysical aspects of human vision. 

7. An image coding method according to claim 6, wherein according to the 
sequence a luminance representation of the image is coded. before chrominance 
representations of the image. 

8. An image coding method according to claim 7, wherein for a given level of 

u i ]- \' .* . :.. . w - v . ; s: • . * ■ ; * • " 

the chrominance representations. 



WO 01/15457 



36 



PCI7G BOO/0315* 



9. An image coding method according to any of claims 6 to 8, wherein the multi- 
resolution representation is generated using a wavelet transform, and the coding 

- *- v -v.^:: z v-\rAzt ;\/;-.r/ i^-n zf rAich r>:j t i V/ 

level of resolution to a high level of resolution. 

10. An image coding method according to claim 9, wherein wavelet orientations 
of horizontal and vertical image components are coded before wavelet orientations of 
diagonal image components. 

11. An image coding method according to claim 10, wherein wavelet orientations 
of diagonal image components of a given level of resolution are coded after wavelet 
orientations of horizontal and vertical image components of a higher resolution. 

12. An image coding method according to any preceding claim, wherein the 
method is implemented as pan of a communications system, and the amount of coded 
information output by the method for a given image is determined on an image by 
image basis in accordance with the available bandwidth of the communications 
system. 

13. An image coding method according to claim 12, wherein where necessary in 
order to fully utilise the available bandwidth of the communications system includes a 
truncated sequence of coded image data, image data representative of areas of least 
importance having been excluded from the truncated sequence* 

14. *An image coding method according to any preceding claim, wherein a 
predetermined code is added to a sequence to indicate the end of image data 
representative of a particular aspect of the image. 

15. An image coding method according to any preceding claim, wherein the image 
is one of a sequence of images, the image is compared to a reference image 
determined using preceding images of the sequence, and the coding method is used to 
code differences between the image and the reference image. 
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16. An image coding method according to any preceding claim, wherein scalar 
quantisation is used to minimise the amount of image data to be coded, the scalar 
quantisation being based upon a psychophysical model. 

17. An image coding method according to any preceding claim, wherein the 
method includes an estimation of motion within an image as compared with a 

reference image, and the estimated morion is included in tfce coded image data. 

/ 

18. An image coding method according to claim jT7, wherein the method includes 
a choice between image data that has been coded/using motion es timati on and data 



that has been coded without using motion estimation, the choice being made upon the 
basis of minimising distortion of the coded im^ge. 



/ 



19. An image coding method according to any preceding claim, wherein die 
method includes vector quantisation of the image, the vector quantisation being 
implemented using a self organising/neural map to provide image data in the form of 
indices o f a codebook. 



20. An image coding nlethod according to claim 9 and claim 19; wherein a 
threshold is applied to toe magnitnde of wavelet coefficients, and those which fell 
below the threshold art converted to zero coefficients. 

21, An image coding method according to claim 9 and claim 19, wherein different 
codebooks m used for different sub-bands of the wavelet representation of the image. 

21 An image coding method according to any of claims 19 to 21, wherein the 
indices of die codebook are subsequently coded using variable length entropy coding. 

23, An image coding method according to claim 22, wherein a series of zero 
indices followed by a non-zero index is coded as a pair ofi values by the variable 
length entropy coding, a first value representing the number of zero indices in the 
series and the second value representing the value of the non-zero index. 
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24. An image coding method according to claim 22 or claim' 23, wherein a 
threshold is applied to the indices of the codebook, and those indices which fall below 
the threshold are converted to zero indices. 

25. An image coding method according to claim 24, wherein wavelet coefficients 
which fell above the ibreshold are reduced by the value of the threshold- 

26. A method of decoding an image coded in accordance with any preceding 
claim, wherein where a truncated sequence of coded image daxa is received, the 
decoder decodes the image using the truncated sequence of coded image data and uses 
zoo values in place of missing coded image data. 

27. A method of decoding an image according to claim 26, wherein the coded 
image is a difference image which is added to a reference image to generate a 
decoded image, and artefacts at higher resolutions of die decoded image caused by the 
truncated sequence are removed by setting the higher resolution data to zero. 

28. An image coding method substantially as hereinbefore described with 
reference to the accompanying figures. 
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Form PCT/lPEA/409 (cover sheen (January 1994) 



INTERNATIONAL PRELIMINARY 
EXAMINATION REPORT 



International application No, PCT/GB00/031 56 



I. Basis of the report 

1 . With regard to the elements of the international application (Replacement sheets which have been furnished to 
the receiving Office in response to en Invitation under Article 14 are referred to In this report as 'originally filed* 
and are not annexed to this report since they do not contain amendments (Rules 70. 16 and 70. 17)}: 
Description, pages: 

1 *3|5-34 as originally filed 

4,4a as received on 14/11/2001 with letter of 06/11/2001 
Claims, No.: 

1-25 as received on 14/11/2001 with letter of " 06/11/2001 
Drawings, sheets: 

1/1 2-1 2/12 as received on 1 3/1 0/2000 



2. With regard to the language, all the elements marked above were available or furnished to this Authority in the 
language In which the international application was filed, unless otherwise indicated under this item. 

These elements were available or furnished to this Authority in the following language: , which is: 

• □ the language of a translation furnished for the purposes of the international search (under Rule 23.1 (b)). 

□ the language of publication of the international application (under Rule 48.3(b)). 

□ the language of a translation furnished for the purposes of international preliminary examination (under Rule 
55.2 and/or 55.3). 

3. With regard to any nucleotide and/or amino acid sequence disclosed in the international application, the 
international preliminary examination was carried out on the basis of the sequence listing: 

□ contained in the international application in written form. 

□ filed together with the international application in computer readable form. 

□ furnished subsequently to this Authority in written form. 

□ furnished subsequently to this Authority in computer readable form. 

□ The statement that the subsequently furnished written sequence listing does not go beyond the disclosure in 
the international application as filed has been furnished. 

□ The statement that the information recorded in computer readable form is identical to the written sequence 
listing has been furnished. 

4. The amendments have resulted in the cancellation of; 
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INTERNATIONAL PRELIMINARY 
EXAMINATION REPORT 



International application No. PCT/GBOO/031 56 



□ 

H 

□ 



the description, 

uia drawing, 



pages: 

Mos.: 



5. □ This report has been established as rf (some of) the amendments had not been made, since they have been 

considered to go beyond the disclosure as filed (Rule 70.2(c)): 

(Any replacement sheet containing such amendments must be referred to under item 1 and annexed to this 
report) 

6, Additional observations, if necessary: 

IIL Non-establfsbmerrt of opinion with regard to novelty, Inventive step and Industrial applicability 

1 . The questions whether the claimed invention appears to be novel, to involve an inventive step (to be non- . 
obvious), or to be industrially applicable have not been examined In respect ot 

□ the entire international application. 
H claims Nos. 1,25. 



□ the said international application, or the said claims Nos. relate to the f ollowing subject matter which does 
not require an international preliminary examination (specify: 

B the description, claims or drawings (indicate particular elements betoW) or said claims Nos. 1 ,25 are 6o 
unclear that no meaningful opinion could be formed [specify): 
see separate sheet 

13 the claims, or said claims Nos. i ,25 are so inadequately supported by the description that no meaningful 
opinion could be formed. 

□ no International search report has been established for the said claims Nos. . 

2. A meaningful international preliminary examination cannot be carried out due to the failure ot the nucleotide 
and/or amino add sequence listing to comply with the standard provided for In Annex C of the Administrative 
Instructions: 

□ the written form has not been furnished or does not comply with the standard. 

□ the computer readable form has not been furnished or does not comply with the standard. 



because: 
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INTERNATIONAL PRELIMINARY International application No. PCT/GBOO/03156 

EXAMINATION REPORT - SEPARATE SHEET 



Re 'tem III 

No basis was found for "on adding the image to the reference image, setting the higher 
resolution data to zero" in the encoder (claim 1 and claim 25). This feature is even in 
contradiction with the description. Claims 1-25 are therefore not supported by the 
description as required by Article 6 PCT. Furthermore this amendment Introduces 
subject-matter which extends beyond the content of the application as filed, contrary to 
Article 34(2)(b) PCT. 

On the paragraph bridging pages 14 and 15 of the description, it is explained that the 
decoder will set to zero the higher resolution data that are not coded for a current 
image and which were coded in the reference image. This is needed to ensure the 
visual quality of the decoded picture in case of shortage of bandwidth because, the 
reference image used by the decoder is not the same as the reference image used by 
the encoder as a skilled person would understand by reading the description page 8 to 
14 (the decoder will have a lower resolution reference image). 
No clear indication could be found in the description that the encoder includes a 
mechanism to set the higher resolution data of the reference image in the coding loop 
to zero. 

The applicant tries to define a coding method with actions that are performed in a 
decoder. 



Form PCT/Ssparato Sneot/409 (Shaat 1) (EPO-Aprtl 1997) 




Error concealment strategies used by video coding methods based upon the H.263 algorithm 
are spatial and therefore show up as corrupted regions in reconstructed image frames. 

Tt C^lf^Ct t|»e T'T^^Tt ^"nvi^^r^ *r) *»Tf\»^^i ^.j;^^ ^..^^.j ^.v!-%t, 

at least one of ths above disadvantages. 

According to the invention there is provided an image coding method comprising generating 
an ordered sequence of coded image data, the sequence beginning with coded data 
representative of an area of the image having high importance, and ending with coded data 
representative of an area of the image having lower importance, wherein the image is one of a 
sequence of images, the image is compared to a reference image determined using preceding 
images of the sequence and the coding method is used to code differences between die image 
and file reference image in a coding loop, wherein when an image is coded to a lower 
resolution than an immediately preceding image, on adding the image to the reference image, 
artefacts at high resolution in the reference image are removed by setting die higher 
resolution data to zero so that the resolution of the reference image corresponds to the 
resolution of the image that was coded, thereby allowing the amount of data which is used to 
represent die coded images to be increased or decreased, to adjust the amount of coded data 
to match an available bandwidth. 

The invention also provides an image coding and decoding method comprising: 
generating an ordered sequence of coded image data, the sequence beginning with coded 
data representative of an area of the image having high importance, and ending with coded 
data representative of an area of the image having lower importance, wherein the image is 
one of a sequence of images, the image is compared to a reference image determined using 
preceding images of the sequence and the coding method is used to code differences between 
the image and the reference image in a coding loop, wherein when an image is coded to a 
lower resolution than an immediately preceding image, on adding the image to the reference 
image, artefacts at high resolution in the reference image are removed by setting the higher 
resolution data to zero 30 that the resolution of the reference image corresponds to the 
resolution of the image that was coded, thereby allowing the amount of data which is used to 
represent the coded images to be increased or decreased to adjust the amount of coded data to 
match an available bandwidth; and 
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subsequently decoding the coded data by adding the coded data to a reference image 
in a coding loop, wherein when a coded image has been coded to a lower resolution than an 
immediately preceding image, on adding the coded image to the reference image during 
decoding, artefacts at high resolution in the reference image are removed by setting the 
higher resolution data to zero so that the resolution of the reference image corresponds to the 
resolution of the coded im age. 

Preferably, the importance of the image areas represented by the coded data decreases 
gradually over the ordered sequence. 

Preferably, the image data coding sequence is arranged in a substantially Spiral configuration 
centred on the area of importance. 

Preferably, the area of importance is at a location selected as file most likely centre point of 
foveated vision-of a viewer of the image. 

Preferably/the area of importance is at a centre point of the image. 

Preferably, the method includes converting an image into a multi-resolution representation, 
different resolution representations of the image being coded in sequence, the order of die 
sequence being determined to reflect psychophysical aspects of human vision. 

Preferably, according to the sequence a luminance representation of the image is coded 
before chrominance representations of the image. 

Preferably, for a given level of resolution; the luminance representation is arranged to include 
more resolution than the chrominance representations. 



m 



Claims 



1. An image coding method comprising generating an ordered sequence of coded image data, 
the sequence beginning with coded data representative of an area of the image having high 

■ • ■ '.-.*■'. . " , , ■ ' r r ^ x * ■ -1 -.• ■ f' \ « "» *■ • "* «» J V~ ' -» * 

/J.* ? - ■ r - -l ..>...-* -- A ^ . • . - . • . ^ - 1 .. * r-^^ iwW^i^ 

importance, wherein the image is one of a sequence of images, the image is compared to a 
reference image determined using preceding images of the sequence and the coding method is 
used to code differences between the image and the reference image in a coding loop, wherein 
when an image is coded to a lower resolution than an immediately preceding image, on adding the 
image to the reference image, artefacts at high resolution in the reference image are removed by 
setting the higher resolution data to zero so that the resolution of the reference image corresponds 
to the resolution of the image that was coded, thereby allowing the amount of data which is used 
to represent the coded images to be increased or decreased, to adjust the amount of coded data to 
match an available bandwidth. 

2. An image coding method according to claim 1, wherein the importance of the image areas 
represented by the coded data decreases gradually over the ordered sequence, 

3. An image coding method according to claim 2, wherein the image data coding sequence is 
arranged in a substantially spiral configuration centred on the area of importance. 

4. An image coding method according to any preceding claim, wherein the area of 
importance is at a location selected as the most likely centre point of foveated vision of a viewer 
of the image. 

5. An image coding method according to claim 4, wherein the area of importance is at a 
centre point of the image. 

6. An image coding method according to any preceding claim, wherein the method includes 
converting an image into a multi-resolution representation, different resolution representations of 
the image being coded in sequence, the order of the sequence being determined to reflect 
psychophysical aspects of human vision. 

7. An image coding method according to claim 6, wherein according to the sequence a 
luminance representation of the image is coded before chrominance representations of the image. 
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8; An image coding method according to claim 7, wherein for a given level of resolution, the 
luminance representation is arranged to include more resolution than the chrominance 
representations. 

9. An image coding method according to any of claims 6 to 8, wherein the multi-resolution 
representation is generated using a wavelet transform, and the coding sequence comprises wavelet 
representation of the image which increase from a low level of resolution to a high level of 
resolution. 

10. An image coding method accoiding to claim 9, wherein wavelet orientations of horizontal 
and vertical image components are coded before wavelet orientations of diagonal image 
components. 41 

11. An image coding method accoiding to claim 10, wherein wavelet orientations of diagonal 
image components of a given level of resolution are coded .after wavelet orientations of horizontal 
and vertical image components of a higher resolution. 

12. An image coding method according to any preceding claim, wherein the method is 
implemented as part of a communications system, and the amount of coded information output by 
the method for a given image is determined on au image by image basis in accordance with the 
available bandwidth of the communicarions system. 

13. An image coding method according to claim 12. wherein where necessary in order to folly 
utilise the available bandwidth of the communicatioas system includes a truncated sequence of 
coded image data, image data representative of areas of least importance having been excluded 
from the truncated sequence. 

14. An image coding method according to any preceding claim, wherein a predetermined code 
Is added to a sequence to indicate the end of image data representative of a particular aspect of the 
image. 

15. An image coding method according to any preceding claim, wherein scalar quantisation is 

,.„•,',...•.•• •'*•.(* - •' f f '■ — *•<• ft-tr f> V tls'(f- f'O"^ 7 * ouwtSfPtion henrj based vtiod. 

a psychophysical modeL 
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16, An image coding method according to any preceding claim, wherein tfxk method includes 
an estimation of motion within an image as compared with a reference image, and the estimated 
motion is included in the coded image data. 

1/. j^it-^v CCuXd^ *iiwuiwU iiCCO^vAilQ wJ U-iM^-Aii -iJj V/ii4v»-l „*,.v*l>«. ♦> ^ J.Z.iJJr 

between image data that has been coded using motion estimation and data that has been coded 
without using motion estimation, the choice being made upon the basis of minimising distortion 
of the coded image. 

18. An image coding method according to any preceding claim, wherein the method includes 
vector quantisation of the image, the vector quantisation being implemented using a self 
organising neural map to provide image data in the fonn of indices of a codebook 

19. An image coding method according to c|gg£ ifpTdefSSSen^Bn 9, wherein a threshold 
is applied to the magnitude of wavelet coefficients, and those which fell below the threshold are 
converted to zero coefficients. 

20. An image coding method according to claim 18 as dependent upon 9, whereiyi different 
codebooks are used for different sub-bands of the wavelet representation of the image. 

2L An image coding method according to any of claims 18 to 21, wherein the indices of the 
codebook are subsequently coded using variable length entropy coding. 

22. An image coding method according to claim 21, wherein a series of zero indices followed 
by a non-zero index is coded as a pair of values by the variable length entropy coding, a first 
value representing the number of zero indices in the series and the second value representing the 
value of the non-zero index. 

23. An image coding method according to claim 21 or claim 22, wherein a threshold is applied 
to the indices of the codebook, and those indices which fell below the threshold are converted to 
zero indices. 

24. An image coding method according to claim 23, wherein wavelet coefficients which fell 
above the threshold are reduced by the value of the threshold. 



25. An image coding and decoding method comprising: 
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generating an ordered sequence of coded image data, the sequence beginning with coded 
data representative of an area of the image having high importance, and ending with coded data 
representahve of an area of the image having lower importance, wherein the image is one of a 
sequence of images, the image is compared to a reference image determined using preceding 
images of the sequence and the coding method is used to code differences between the image and 
the reference image in a coding loop, wherein when an image is coded to a lower resolution than 
an immediately preceding image, on adding the image to the reference image, artefacts at high 
resolution in the reference image are removed by setting the higher resolution data to zero so that 
the resolution of the reference image corresponds to the resolution of the image that was coded, 
thereby allowing the amount of data which is used to represent the coded images to be increased 
or decreased to adjust the amount of coded data to match an available bandwidth; and 

subsequently decoding the coded data by adding me coded data to a reference image in a 
coding loop, wherein when a coded image has been coded to a lower resolution than an 
immediately preceding image, on adding the coded image to the reference image during decoding 
artefacts at high resolution in the reference image are removed by setting the higher resolution 
data to zero so that the resolution of the reference image corresponds to the resolution of the 
coded image. 
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